Learning to Filter Junk E-Mail from Positive and Unlabeled Examples

نویسنده

  • Karl-Michael Schneider
چکیده

We study the applicability of partially supervised text classification to junk mail filtering, where a given set of junk messages serve as positive examples while the messages received by a user are unlabeled examples, but there are no negative examples. Supplying a junk mail filter with a large set of junk mails could result in an algorithm that learns to filter junk mail without user intervention and thus would significantly improve the usability of an e-mail client. We study several learning algorithms that take care of the unlabeled examples in different ways and present experimental results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prologue: A machine learning sampler

Y OU MAY NOT be aware of it, but chances are that you are already a regular user of machine learning technology. Most current e-mail clients incorporate algorithms to identify and filter out spam e-mail, also known as junk e-mail or unsolicited bulk e-mail. Early spam filters relied on hand-coded pattern matching techniques such as regular expressions, but it soon became apparent that this is h...

متن کامل

E-mail Filtering Tool

As e-mail becomes on of the most widely used methods of communication, with it’s ease of use, speed and low cost it has also become the target of advertisers. Just like “snail” mail, e-mail has become prone to junk e-mail but unlike “snail” mail the cost of distributing unsolicited junk e-mail to vast numbers of people is relatively cheap. This has lead to the desire for these unwanted e-mails ...

متن کامل

Algorithm of E-mail Classification Based on Automatic Adapting for User

E-mail classification is an effective method to manage, improve process efficiency and filter junk mail. The extraction of E-mail characteristic is the key problem of exactness classification. In order to make the classification has a more distinct division characteristic words, IDF (Inverse document frequency) is used to epurate further the characteristic. The procedure which users deal with E...

متن کامل

Positive and Unlabeled Examples Help Learning

In many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorit...

متن کامل

A Survey on Various Classifiers Detecting Gratuitous Email Spamming

Email becomes the major source of communication these days. Most humans on the earth use email for their personal or professional use. Email is an effective, faster and cheaper way of communication. The importance and usage for the email is growing day by day. It provides a way to easily transfer information globally with the help of internet. Due to it the email spamming is increasing day by d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004